Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files API #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Files API #9

wants to merge 1 commit into from

Conversation

dtkav
Copy link
Contributor

@dtkav dtkav commented Mar 20, 2019

This API relies on my fork of datalake-common library (datalake-common-dtkav).
planetlabs/datalake-common@master...dtkav:dist

The goal of the Files API is to:

  1. Provide a centralized and searchable place to store file metadata of all kinds, especially logs, particularly along the time-axis. (DatalakeFile model)
  2. Provide a content-addressed backend that de-duplicates file contents and compresses them (s3).
  3. Provide a managed set of allowed 'what's which can be provisioned by an admin without deploying MC. This is done via the What model.
  4. Provide a mechanism for relating files to MC objects. This is done via RelatedFile model, and work-id field on the DatalakeFile model.
  5. Store metadata alongside data to allow repopulating the database in case of catastrophic failure.

Design:
DatalakeFile objects are just metadata, and contains a pointer to the content-id (cid) of a file.
The content-id is: mutibase('base32', mutlihash('blake2b-16', blake2b(file_contents, digest_size=16)))

The work-id field can be any string (so users can include their own prefixes).
However, there are a special set of work-id prefixes that correspond to objects in mission control.
Adding files with these prefixes will make the files available from those objects directly.
A work-id looks like this: mc-<model_name>.<uuid>.
Alternately, a user might be running jobs with jenkins and use something like: jenkins-job.<id>. These files would not be related to any objects in mission control, but users in the ecosystem can still leverage the datalake to store all of the files in one place.

@dtkav
Copy link
Contributor Author

dtkav commented Mar 20, 2019

This is to the point where it could use some proper review. I had a lot of lessons-learned along the way.
The main changes - I've set version back to the metadata version. This is because version is done by time (latest file in time is the latest version).
Also the CID is no longer unique as multiple metadata records can happily reference the same file contents.

@dtkav dtkav force-pushed the files branch 8 times, most recently from 522d143 to 65667f2 Compare March 29, 2019 06:44
@dtkav dtkav changed the title WIP: Files API Files API Mar 29, 2019
@dtkav
Copy link
Contributor Author

dtkav commented Mar 29, 2019

I just noticed the files api uses start and end, whereas the rest of the API uses start_time and end_time. :/

elif isinstance(o, datetime.timedelta):
return duration_iso_string(o)
elif isinstance(o, (decimal.Decimal, uuid.UUID, Promise)):
return str(o)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str of a Promise doesn't sound right?
Also Promise doesn't seem to be defined anywhere?

return data


class ISODateTimeField(models.DateTimeField):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel there must be a better way to do this than copy paste this class across django apps.

@@ -73,7 +73,7 @@ def to_dict(self):
opts = self._meta
data = {}
for f in chain(opts.concrete_fields, opts.private_fields, opts.many_to_many):
if f.name is 'id':
if f.name == 'id':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch :\

/datalake/admin/whats/:
get:
security:
- jwt: ['admins']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear what this part of the tag will do?
What defines who an 'admin' is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

admins doesn't appear to be defined anywhere else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants